A Hitchhiker's guide to RNA–RNA structure and interaction prediction tools

Abstract RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA–RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA–RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA–RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.


INTRODUCTION
More than 60 years ago, the central dogma of molecular biology was first introduced by Francis Crick as a model to describe the transfer of genetic information from DNA to protein [1].Since then, several attempts have been made to interpret the composition of RNA subtypes in the human genome and their roles in protein synthesis [2,3].Typically, Watson-Crick base-pairing is known to maintain the genetic continuity of RNA replication, and encoded proteins are not involved as catalysts [1,4].The adaptability of RNA molecules has spawned the 'RNA World' hypothesis, in which RNA replication-based evolution takes precedence over DNA-centred evolution and protein synthesis [2,[5][6][7].The 'RNA World' hypothesis depicts the possibility of storing genetic material via RNA alone and its ability to self-replicate as the primary source of catalytic mechanisms without the involvement of proteins [8][9][10][11][12][13][14][15][16][17].Since the discovery of protein-encoding messenger RNA (mRNA) in the 1960s, it has received a great deal of attention due to its critical function in protein synthesis and is considered the inevitable intermediary necessity in producing proteins [18].Nevertheless, high-throughput sequencing platforms create a paradigm shift, as over 90% of the human genome is transcribed into RNA [18,19].Of all, 2% of the RNA in the genome encodes proteins, while the remaining is easily transcribed into nonprotein-encoded RNA (also known as noncoding RNA or ncRNA) molecules [20][21][22][23][24].In summary, advances in sequencing technology have enabled the discovery of ncRNAs, bringing RNA biology to the forefront and revealing the intricate role of ncRNAs in human cells [25][26][27][28].
Despite extensive functional studies, the molecular mechanisms of ncRNA-centric roles remain elusive and require advances in experimental biomedicine [34][35][36].However, emerging RNA-RNA interaction (RRI) tools offer promise in reducing experimental efforts.Understanding these mechanisms requires investigating ncRNA interactions with cellular components such as proteins, DNA sites and other RNAs [37].Remarkably, numerous classical ncRNAs communicate with other RNA subtypes, either directly via base pairing or indirectly via protein intermediates.Examples include transfer RNA-messenger RNA (tRNA-mRNA) interactions to translate genetic code; miRNA-mRNA interactions to stimulate mRNA degradation; and mRNA-protein interactions involving RNA splicing, editing and ribosomal RNA maturation [38][39][40][41].These findings imply that RRIs portray a universal strategy utilized by many ncRNAs, and completely mapping these interactions could provide insight into ncRNA functions and mechanisms.The RNA interactome has emerged as a central component of many regulatory processes, prompting extensive research from both wet lab and computational researchers [42][43][44][45].Nonetheless, mapping RRIs remains challenging, as current methods struggle to identify and differentiate between direct and indirect RRIs and may have limited resolution for specific RNA examination.

TYPES OF INTERACTIONS
RNA molecules are not just passive carriers of genetic information; they actively participate in various cellular processes through their interactions with other molecules [46].Understanding these roles and interactions is crucial for advancing our knowledge of cellular biology.RNA molecules interact with other RNAs, proteins and DNA to carry out their functions.
RNA-DNA interactions are essential for several biological processes.One of the most well-known examples is transcription, where an RNA molecule is synthesised based on the DNA template.Another example is the process of reverse transcription in retroviruses, where viral RNA is reverse transcribed into DNA.For instance, in RNA interference (RNAi), small RNA molecules can bind to complementary sequences in mRNA molecules, leading to their degradation and thus preventing their translation into proteins [47].Another example is the clustered regularly interspaced short palindromic repeats system, a bacterial defense mechanism that has been adapted for genome editing.In this Nobel-prize winner system, RNA molecules guide the Cas9 nuclease to specific locations in the DNA, allowing precise cuts to be made [48].More recent studies have also highlighted the role of ncRNAs in regulating chromatin architecture via interaction with DNA or chromatin-associated proteins to modulate gene expression.Some ncRNAs function through the formation of R-loops with the complementary sequence from their transcribed loci and affect local gene expression [49].
RNA-protein interactions are fundamental to cellular processes and play a crucial role in the life cycle of an RNA molecule, from its synthesis and processing to its eventual function in protein synthesis.Proteins can bind to RNA to form ribonucleoprotein complexes, which are involved in various aspects of RNA metabolism, including splicing, polyadenylation, stability, transport, and translation [50].The spliceosome, a large ribonucleoprotein complex, is responsible for removing introns from pre-mRNA, a process known as splicing, and is crucial for the maturation of mRNA molecules and their subsequent translation into proteins [51].RNA-protein interactions also play a role in polyadenylation, the addition of a poly(A) tail to the 3 end of an mRNA molecule that enhances the stability of the mRNA and facilitates its export from the nucleus and transport within the cell [52].During protein translation, mRNA molecules interact with ribosomes, which are themselves ribonucleoprotein complexes, to synthesize proteins based on the sequence of the mRNA that determines the sequence of amino acids in the protein [53].
RNA also interacts with other RNA.For instance, RNA molecules can form complex secondary and tertiary structures through interactions with other RNA molecules, whereby these structures are critical for the function of many types of RNA, including ribosomal RNA (rRNA), transfer RNA (tRNA) and mRNA [54].In the ribosome, which is a complex of rRNA and proteins, mRNA and tRNA interact to facilitate protein synthesis.The rRNA provides the structural framework for the ribosome and contributes to its catalytic activity [55].RRIs also play a role in the regulation of gene expression.For instance, miRNAs can basepair with target mRNAs to repress their translation or induce their degradation [56].Dysregulation of RRIs can lead to various diseases.For example, mutations that affect the secondary structure of RNA can disrupt normal RRIs and lead to diseases such as cancer [57].Understanding these interactions is crucial, as they play a significant role in cellular processes, and their dysregulation can lead to various diseases.Therefore, tools that can predict and analyse these interactions are of great importance in advancing our knowledge of cellular biology and developing therapeutic strategies.

Types of RNA-RNA interactions
There are two main types of interactions in RNA molecules, namely, cis-only and trans RRIs (Figure 1).The former is defined as the intramolecular base pairing between nucleotides within a single RNA molecule (Figure 1B) [58].It permits canonical Watson-Crick base-pairing between {adenine (A) and uracil (U)} and {guanine (G) and cytosine (C)} and non-Watson-Crick/wobble base-pairing between {guanine (G) and uracil (U)} (formed by edge-to-edge hydrogen bonding interactions between the bases) (Figure 1A) [59][60][61].The intramolecular RRI aids in the formation of short double-stranded helices and allows folding into specific 3D structures such as tRNA and mRNA, which form the basis for molecular recognition events [62,63].
On the other hand, trans RRI is made up of two or more RNAs that interact intermolecularly via Watson-Crick base pairing, wobble base pairing or helical stacking (Figure 1C) [64,65].miRNAs, for example, can target the 3 untranslated regions (3' UTRs) of mRNAs [66][67][68], whereas spliceosomal small nuclear RNAs (snRNAs) recognize the intronic regions of pre-mRNAs [69,70].Duplex formation through base pairing of complementary nucleotides leads to naturally occurring RRIs.They are crucial for various processes, including RNA cleavage, RNA editing, RNA modification, RNA splicing, RNA translation, suppression of RNA translation and RNA degradation [71][72][73][74][75]. Additionally, base-pair interactions are crucial for semiconservative replication, energetically favourable arrangement of base pairs, and the formation of helical RNA structures [76].Intramolecular interactions lead to the formation of RNA secondary structures, which is why researchers commonly refer to the prediction of cis-only RRIs as where both RRI types are involved.Inter-and intramolecular base pairs are indicated by vertical pipe symbols and arches, respectively (adapted from [25]).
the method for RNA structure prediction (RSP).To summarise, intramolecular interactions form secondary RNA structures (cisonly RRI), while intermolecular interactions occur when two individual RNAs interact (trans RRI).
Predicting RRIs based solely on intra-or intermolecular interactions presents significant challenges due to the diverse conformations [77,78] and conformational changes of RNA molecules [79][80][81].Complexities also arise from the three-dimensional folding, secondary structures [82] and tertiary interactions of RNA molecules [83].Therefore, focusing exclusively on one type of RNA interaction may result in the oversight of crucial interactions occurring across different regions of an RNA molecule [84].Nonetheless, concatenating both intra-and intermolecular RNA interactions (Figure 1D) permits a more comprehensive analysis, capturing a broader range of interactions and revealing complex RNA networks.This integrated approach provides a more realistic representation of RRIs in biological systems and offers insights into their contribution to overall RNA architecture.Utilizing both types of interactions for prediction provides a more robust and holistic framework compared to relying on either one alone.
RRIs are modelled at various levels of complexity, depending on their common and distinguishing features, which are translated into sophisticated computational algorithms.Complexity refers to the intricacy and sophistication of the computational approach used to model RRIs.However, current RRI models cannot account for real-time biological and chemical information in the cellular environment, except at a coarser level of detail [85].These models typically focus on sequence complementarity, thermodynamic stability, or structural motifs, which may not fully capture the intricacies of the cellular context [86].Using RSP-like algorithm tools could facilitate RRI prediction (RIP) by providing reliable information on interacting nucleotide positions, revealing potential biological roles and regulatory mechanisms of mRNAs and ncRNAs [87].In short, there is a need for RSP-like algorithms to better understand RNA sequences and their interactions in real time, improving RIP models and gaining deeper insights into their biological significance.

RNA-RNA INTERACTION MAPPING VIA EXPERIMENTAL DATA: LIMITATIONS AND TECHNIQUES
The secondary structure of ncRNA serves as a scaffold for the tertiary structure and facilitates catalytic and ligand binding interactions with various RNAs [33,44,88].RIP tools use similar ideas and algorithms to predict RNA secondary structures.Xray crystallography (single crystal X-ray diffraction (XRD)) and nuclear magnetic resonance (NMR) spectroscopy are the most accurate and robust conventional methods for detecting threedimensional (3D) RNA structures [89,90].Although XRD provides high atomic resolution with no size limitations, crystallizing 3D RNA structures is challenging.Conversely, NMR excels when crystallization is impossible and provides solution state dynamics but has limitations on molecular weights (<50 kDa) [91].Combining XRD and NMR results in a more accurate structure determination method, providing ncRNA structural information at a single basepair resolution [92,93].Nonetheless, their widespread application is hampered by high experimental costs, low throughput, limited ncRNA resolution measurements and structure detection in vitro, difficulty in translating to in vivo conformation, and < 0.001% of ncRNAs identified from experimental data [94].
Numerous sequencing-based systems have been developed over the last decade for the experimental identification of RNA interactomes.However, current RRI mapping methods, such as RNA interactome analysis and sequencing (RIA-Seq) and RNA antisense purification (RAP)-Seq, do not directly assay RNA interactomes [95,96].Instead, they rely on anchored RNAs and molecular perturbations to identify interaction targets of specific RNAs [97].This one-RNA-at-a-time approach makes it challenging to comprehensively identify all RRIs.Following this, several highthroughput techniques, including PARIS [98], SPLASH [99], LIGR-Seq [100] and MARIO [97], have been introduced.They map the entire RNA interactomes in vivo besides identifying interacting partners of specific target RNAs at a larger scale.Online databases such as RAID v2.0 [101], NPinter [102][103][104], RNAinter [105,106] and RISE [107] organise and classify these RRIs based on curated data from various sources (bibliometrics, experimental data, etc.).Nevertheless, a complete picture of human RNA-associated interactions is lacking, with most observed interactions associated with ribosomal and small RNAs rather than ncRNAs.Tissue-specific expression patterns of RNAs require numerous repetitions of in vivo experiments to detect genome-wide RNA interactomes [20,108].Therefore, computational RIP methods remain indispensable compared to experimental approaches.

STATE-OF-THE-ART APPROACHES FOR RNA STRUCTURE AND INTERACTOME PREDICTION
Computational prediction methods are widely used for identifying RRIs.The discovery of the minimum free energy (MFE) structure of RNA sequences has garnered attention due to its association with RNA secondary structures and folding stability.The MFE of an RNA includes the sequence length, nucleotide content/composition and nucleotide order/arrangement [109].Longer RNA sequences tend to be more stable due to stacking and hydrogen bond interactions [110].The composition of nucleotides also inf luences RNA stability; G-C-rich sequences are more durable than A-U-rich sequences due to additional hydrogen bonds.The specific arrangement of nucleotides, including loop numbers and double helix conformations, contributes to folding structure stability [109].
This review aimed to summarise popular computational prediction tools for RIP based on two main strategies: deterministic dynamic programming (DDP) approach and comparative sequence analysis (homology), as illustrated in Figure 2A [ [85][86][87][111][112][113].This landscape ref lects the growing interest and extensive research in the field of RIP. Figure 2B showcases the relationships between these two strategies.

Deterministic dynamic programming algorithm for individual RNA structure and RNA-RNA interaction prediction
The DDP algorithm is a popular and accurate type of RIP that relies on the thermodynamics model.It uses free energy minimization to predict RNA secondary structure based on a single sequence with a known function as an input [114].DDP involves chemically altering nucleotides at Watson-Crick pairing sites in folded RNA using chemicals such as dimethyl sulfate and kethoxal.It is known as a "score-based method" that interprets the native RNA structure with a minimum/maximum total score of RNA folding prediction.
This approach relies on experimental approximations to account for the inf luence of sequence on stability for different RNA motifs.However, it does not account for pseudoknots, which are RNA structures formed by two nonnested base pairs.The nearest-neighbour model considers directly neighbouring bases and base pairs for each interaction [115,116], utilizing loop-specific energy contributions to determine loop type-and context-specific contributions to the RNA structure [114,117,118].

Nussinov algorithm
The application of DDP in RSP ensures efficient computation [119,120], producing consistent and identical results for identifying the lowest free energy structure.DDP simplifies complex RNA Figure 3. Loop decomposition of a nested RNA structure into hairpin loops (no enclosed base pairs), stackings (adjacent enclosed base pairs), bulges (only one side adjacent to enclosed base pair), multibranched loops (more than one directly enclosed base pair), interior loops (no stacked enclosed base pairs), pseudoknots (nucleotides in a loop pair with a region outside the helices that close the loop) and stem-loops (combination of the stem, double helix, and a loop) (adapted from [256]).structures into simpler substructures through mathematical optimization and computer programming [119].The DDP algorithm can be divided into several examples, as reported in Figure 2A.The Nussinov algorithm is the first DDP algorithm that efficiently predicts the optimal folding state of an RNA molecule by computing the maximum number of base-pairings [121].However, it has several biases that can be noted as limitations.For instance, it (i) disregards differences in base-pairing strengths; the inf luence of loop sizes, base-pair stackings, loop context, multiloop, and pseudoknot formations on stability; (ii) lacks approximation-based prediction algorithms that cause the inability to predict pseudoknotted helices; (iii) does not consider folding kinetics, which does not apply to secondary RNA structures; (iv) exhibits asymmetry in the distribution of unpaired nucleotides, leading to destabilization of multibranch loops/helical junctions; (v) shows discontinuity in the formed base pairs; and (vi) is unable to create stem regions, thereby reducing its prediction accuracy [114,122].
To address this, a minimum free energy (MFE) algorithm based on the Nussinov algorithm and nearest-neighbour model was proposed by Zuker in 1981 [123].

Minimum free energy algorithm
MFE algorithms, based on DPP, compute a series of complex freeenergy parameters obtained from experimental methods.One example is the optical melting experiment that measures the thermodynamics of nucleotides.These algorithms breakdown a secondary RNA structure into substructures known as nearestneighbour loops (Figure 3).The free energy of each nearestneighbour loop is computed by adding its specific free energy parameters.The MFE approach can be categorised into four subclasses based on criteria, including intramolecular base pairs (internal structure), neglect of intramolecular structure, accessibility of the binding region, and the ability to predict the joint secondary structure of RNA duplexes [124].
This review provides an overview of MFE algorithms derived from RSP and used in RIP tools to predict the RNA interactome in Tables 1-3 [42,113,114,125].It outlines the main prediction and output strategies employed by each algorithm.'Conservation' indicates whether the prediction tools accept sequence alignments as input, which can help in identifying conserved regions within RNA molecules.'Suboptimal' indicates whether the algorithms report suboptimal results in addition to a single MFE prediction.This feature allows the exploration of alternative RNA secondary structures with lower free energy but remain biologically relevant.The length of the interaction estimates the size of the predicted RNA-RNA helices, categorized as short (≤12 base pairs) or long (>12 base pairs).Additionally, the table distinguishes between local interactions and global predictions.'Local interactions' involve single interactions with gaps and bulges, limited to a few base pairs.These predictions focus on aligning local regions with high similarity.In contrast, 'global predictions' span the entire RNA sequence, including multiple instances of local interactions separated by longer regions lacking intermolecular base pairs.

Interaction-only approach
The first RIP method is known as the 'interaction-only (IO)' approach because it only considers intermolecular base pairs during computation and in the final predicted outcome [87].The MFE derived from intermolecular base pairs between two RNA strands is called the hybridization energy.IO possesses fast algorithmic speed but lower accuracy, as it neglects intramolecular RNA structures that might disrupt and constrain certain intermolecular interactions.IO prediction tools compute the overall Gibbs free energy ( G) and determine the direction of RNA folding.The stable RNA structure is determined by minimizing free energy using thermodynamic data such as temperature and chemical composition.The goal is to find the structure with the lowest Gibbs free energy, indicating its most stable conformation under the given thermodynamic conditions.Examples include DuplexFold [126], targetRNA [127], RNAhybrid [126], RNAplex [128], RNAduplex, RNAaliduplex [125], RIsearch [129] and GUUGle [130] (Table 1).• Prefilter score: The score calculated using Reynold et al. method [260] • Generation of a thermodynamic table which    The DuplexFold server predicts the lowest hybrid free energy conformation of two RNA sequences based on intermolecular base-pairing, whereas targetRNA identifies base-pair complementarity and calculates RRI scores using the MFE model for RNA duplexes [127].Following targetRNA, RNAhybrid predicts eukaryotic miRNA target and prokaryotic sRNA target interactions [126].Both targetRNA and RNAhybrid heavily rely on the energies of stacked back-to-back base pairs, interior loops, and bulges for their prediction.For more efficient computation and less complexity, the consideration of long interior loops is limited and excluded during the RIP process.Conversely, database-based RNAplex is explicitly designed to search for potential hybridization sites in a query RNA.It implements a slightly different energy model than RNAhybrid, shortening computational time and enabling target search on highly stable interactions.
Both RNAduplex and RNAaliduplex, included in the Vienna RNA 2.0 package, predict conserved RRI between two alignments [125].In contrast, the RIsearch algorithm is designed to rapidly scan genome-wide ncRNA-RNA pairs.It incorporates a simplified Turner energy model to the Smith-Waterman-Gotoh algorithm, approximating the Turner nearest-neighbour energy model using the dinucleotide scoring matrix [129].Interestingly, GUUGle stands out by not calculating Gibbs free energies to determine optimal interactions.Instead, it generates all ungapped interactions over a user-specified length, serving as an absolute baseline for predicted performance.Moreover, GUUGle is designed to reduce the search space for more complex algorithms [130].Overall, all the IO methods predicted RRI solely based on intermolecular base pairs.

Accessibility-based approach
To overcome the shortcomings of IO prediction tools, the accessibility-based (AB) approach was introduced to predict intraand intermolecular base pairs [87].AB uses the McCaskill partition function algorithm to predict the pairing likelihood of single nucleotide sequences at each position of the input sequence data [131].The stability of intermolecular interactions at specific positions is determined by calculating stacking base pairs and the likelihood of intramolecular base pairs being inaccessible within the RNA molecules.The energy needed to prevent interacting RNA segments from forming intramolecular base pairs is known as accessibility energy.Sfold [132], RNAup [133], IntaRNA [134,135], RNAplex [128], RNApredator web server [136] (updated version of RNAplex), OligoWalk [137], BistaRNA [138], inRNAs [139], RIsearch2 [140], RIblast [141] and targetRNA2 [142] are examples of prediction tools that adopted the AB approach (Table 2).
The Online Sfold tool predicts RNA secondary structure, target accessibility and hybridization energy [132].It can compute the accessibility of binding regions and calculate the MFE of the RNA duplex via RNAup [133], IntaRNA 2.0 [134] and RNAplex [136].However, RNAplex and RNAup cannot predict pseudoknots, while IntaRNA 2.0 is limited to interactions between single hairpin loops and excludes kissing hairpins (more complex pseudoknots/multiloops). OligoWalk predicts the hybridization of oligonucleotide binding by calculating the total free energy of an RNA sequence to the target sequence of a known structure [137].BistaRNA and inRNAs provide insights into RNA accessibility and can predict multiple binding sites [138,139].Similarly, RNApredator is a fast accessibility-based prediction tool for single small RNA targets that uses a full nonpseudoknot partition function of interacting strands in a dilute solution [136].
RIsearch2 and RIblast are genome/transcriptome-wide scale RIP tools that implement the seed-and-extension approach to discover seed regions using suffix arrays and possess faster computational speed (64×) than other existing similar programs [141].The seed regions are further refined using an energy model of the predicted RNA secondary structure [140].On the other hand, Tar-getRNA2 is a tool for identifying targets of small regulatory RNAs (sRNAs) in bacteria via conserved regions, secondary structures, individual mRNA target secondary structures, and sRNA-mRNA hybridization energy.In RIP, TargetRNA2 suggests that the more conserved two sRNAs have in common, the more likely they are to interact with one another.

Concatenation-based approach
The third subclass of the MFE-based RIP tool involves both intermolecular and intramolecular base pairing of RNA.This approach is called concatenation-based, where two input sequences are concatenated and run through classical RSP algorithms to compute internal and external base pairs simultaneously [87].Examples of concatenation-based tools include RNAsoft [143], Pair-Fold [144], RNAfold [125], MultiFold [144], RNAcofold [125,145], UNAFold (mfold/RNAfold) [146], RNAnue [147] and NUPACK [148,149] (Table 3).However, they are limited due to the inability to predict pseudoknots accurately, where the base pairs are not well nested but overlap with each other.
In 2003, Andronescu et al. introduced an RNAsoft suite of programs to predict the secondary structure (PairFold), test combinatorial tag sets (CombFold) and design RNA strands (RNA Designer) [144,150,151].PairFold is the first tool to predict suboptimal secondary structures of two interacting strands, and MultiFold is the first to handle multiple strands.Both programs use the standard thermodynamic parameters of Turner for RNA molecules [113,132,144].RNAfold is a web tool that predicts the secondary structures of single-stranded RNA sequences [125].Compared to RNAfold, RNAcofold allows the prediction of RNA secondary structures of single-stranded RNA sequences upon dimer formation [125,145].On the other hand, unified nucleic acid folding and hybridization package (UNAFold) is an amalgamation of mfold and DINAMelt.It predicts the pseudoknot-free RNA secondary structure of a single RNA sequence by simulating its folding, hybridization, and melting pathways.The prediction minimizes the global free energy using an improved algorithm by Zuker and Stiegler [125,146,151,152].RNAnue predicts inter-and intramolecular RRIs using complementary strands of double-stranded RNA information through direct-duplexdetection (DDD) methods [147].

Multiple sequence alignments and complex joint approach
Sequence alignment is a method to align DNA, RNA or protein sequences, predicting conserved regions that represent functional or evolutionary relationships between two sequences.Pairwise alignment determines the best-matching pattern of two sequences, whereas multiple sequence alignment involves multiple sequences simultaneously.Local alignment identifies local regions with the highest similarity level in sequences, whereas global alignment spans the entire sequence.RNAPLEX [128] and RNAduplex [125] are programmes that predict conserved RRIs using sequence alignments.
Another RIP tool of the MFE algorithm is known as the 'complex joint' (CJ), owing to MFE computation to identify the RRI between multiple RNA alignments.Unlike single RNA secondary structurebased RIP tools [33,44], CJ can handle more complex joint structures with multiple interaction sites [153][154][155][156][157][158].This capability is crucial, as ncRNAs often interact with target mRNAs in gene translation.Moreover, these relatively long regulatory antisense RNAs are not fully complementary to their target sequences.Instead, they rely on stable joint structures with mRNA via loop-loop interactions to facilitate regulatory functions [155].Nevertheless, predicting these RNA secondary structure complexes with MSA is challenging (nondeterministic polynomialtime (NP)-hard problem), and only a few dedicated tools are available.
MultiRNAFold is a CJ-based package that includes three types of software: SimFold, PairFold and MultiFold [144].It computes the MFE for predicting the secondary structure of interacting RNA molecules.Early attempts, such as PairFold [144] and RNAcofold [159], treated two interacting RNA sequences as a single sequence but faced challenges in predicting complex interactions such as kissing hairpins.
In 2007, Dirks et al. [160] introduced the NUPACK package, which efficiently computes the partition function of a single to multiple RNAs and concatenates input sequences in order, considering their symmetries and sequence heterogeneity.Similarly, BPPart, a revised algorithm of rip [157] and piRNA [154], computes the partition function for joint structures.The energy model is simplified by ignoring the entropy systems while retaining the thermodynamic information captured by more complex models [161].The inRNAs algorithm predicts multiple binding sites in an RNA complex [139], while RIG utilizes multiple context-free grammars to model RRI [162].Other CJ tools, such as IRIS [156], inteRNA [153] and piRNA [154], were previously available, but they are obsolete or no longer supported.
This review highlights that CJ methods are limited to relatively short RNA sequences to improve runtime performance.Although longer sequences cover a broader class of interacting RNA structures simultaneously, they are highly resource intensive and impractical for genome-wide scans.To overcome this challenge, Kato et al. [163] developed RactIP (RNA-RNA interaction prediction using integer programming), a novel method to increase the input RNA sequence length while optimizing runtime performance and prediction accuracy using the threshold cut technique.

Comparative sequence analysis for RNA structures and RNA-RNA interaction prediction
The structures of functional ncRNAs are crucial in understanding their functions and evolutionary conservation.Structural alignment compares a folded RNA to known reference ncRNAs, identifying similar regions called 'conserved regions.'Comparative sequence analysis allows the identification of these conserved regions.The alignment score represents the similarity in the ncRNA sequence and structure.Comparative analysis suggests that RNA-forming base pairs in RNA secondary structures tend to be more conserved and covary during evolution to maintain Watson-Crick and wobble pairings (compensatory mutations) [87,164,165].This supports the theory that base pairs with fully conserved or retained structures from compensatory mutations are more functionally important than unconserved base pairs [87].
Multiple sequence alignment (MSA) is one of the oldest comparative studies used to detect common secondary structures from a set of homologous sequences.By including well-aligned and sufficiently divergent homologues, MSA provides valuable information for predicting evolutionarily conserved base pairs.This approach also significantly improves the accuracy of the RSP tool and overcomes shortcomings of the MFE-based approach, such as the difficulty in aligning RNA sequences with low similarity (<60%) and folding different primary sequences into the same secondary structures.
To date, comparative sequence analysis (homology) is more accurate than DPP approaches in RSP [166,167].This review highlights three major components of comparative sequence analysis (Figure 4A), including several examples of freely available homology-based tools in RIP, as tabulated in Tables 4-7 [164].

Align-then-fold approach
The align-then-fold approach extends RSP to multiple sequences by aligning them based on similarity and then predicting the structure with the lowest free energy that is shared by the largest number of sequences [168].This approach requires a conventional alignment tool (e.g., ClustalW [169,170], MAFFT [171]), followed by RSP tools (e.g., RNAalifold [172], Pfold [173]).The RNAalifold web server is one of the most important and commonly used tools (combined with score-based methods) [172], whereas Pfold includes compensatory mutations for accurate secondary RSPs [173].Meanwhile, PETfold combines thermodynamic and evolutionary perspectives into a single model [174].In short, the alignthen-fold method is efficient for sequences with high similarity (>60%) and is a computationally less expensive method than the Sankoff-type and fold-then-align methods.
Table 4 summarizes a comprehensive overview of align-thenfold RSP tools.

Sankoff-type approach
The Sankoff algorithm is the most rigorous and computationally expensive approach to align RNA structure [175].It combines structural prediction and sequence comparison simultaneously, ensuring similarity between structures by considering base-pair input in both [175][176][177].This approach yields more accurate predictions than methods that separate folding and alignment steps, but it requires additional computer memory [178].The Sankoffbased tools include MARNA [179], Foldalign [180][181][182], Dynalign [183], Stemloc [184] and MXSCARNA [185] (Table 5).They employ the Sankoff algorithm to explore the structural space and calculate the optimal secondary structure considering both sequence and structure conservation [175][176][177].Additionally, some variants use sequence-based heuristics to reduce computational complexity and align efficiently.
Another approach uses McCaskill's algorithm to calculate base-pair probabilities via dynamic programming (Table 6), such as PMcomp [186] and LocARNA [187], whereas FoldalignM [188] and Murlet [189] employ a different algorithm called 'maximum expected accuracy' (MEA).StrAl with PETcofold [190] combines Sankoff and McCaskill's algorithm, using Sankoff for RSP and McCaskill's algorithm for base-pair probability calculation.This approach reduces the structural search space, computational complexity, and runtime by utilizing a simplified energy model based on precalculated base-pair probabilities from McCaskill's algorithm, rather than directly calculating loop energies as in the Sankoff approach.Notably, RNA alignment and folding is not part of the Sankoff algorithm but a separate algorithm integrating sequence alignment and RSP, providing a comprehensive analysis of both sequence and structure aspects.It combines subsequence alignment quality-based heuristics and the simplified energy model of PMcomp to simultaneously align and fold unaligned RNA sequences [184,191].• Estimation of the tree using a maximum likelihood approach in the SCFG model [265] • Prediction of structure given as a bracket notation via CYK algorithm [266] • Evaluation of the reliability of the prediction for each position     1.Align-then-fold approach; 2. Fold-then-align approach and 3. Sankoff-type approach (alignment and folding simultaneously); (B) The two main approaches in comparative RIP are (i) interaction between two alignments via an individual RIP tool and (ii) interactions obtained from the joint output of multiple individual RIP tools (adapted from [257]).

Fold-then-align approach
The fold-then-align method involves first predicting the secondary structures of RNA sequences and then identifying the structure with the lowest free energy across all sequences.This method often employs MSA to improve conserved RSPs.Another approach explores a middle path, where individual secondary structures are identified for each sequence in sets, followed by postprocessing to determine the optimal structure shared by all sequences.However, the accuracy depends on the quality of input RNA structures and may be limited by the number of matched homologous sequences, leading to potential false positives.Consequently, the overall alignment quality is typically affected by individual RSP approaches [192].RNAforester [193], RNAcast [193] and aliFreeFoldMulti [194] are examples of applications implementing the fold-then-align method (Table 7).
To improve accuracy despite limitations in alignment quality, Notredame and colleagues developed the T-Coffee tool by implementing a preprocessing procedure that generates a library of local and global pairwise alignments [195].It creates a consensus MSA by combining signals from diverse heterogeneous sources, such as sequence and structure alignment programs.Other methods, including planACstar [196], MASTS [197] and RNA Sampler [198], use sampling techniques to refine alignment and folding structures.However, CMfinder [199] and LaRA [200] stand apart from conventional categories because CMfinder specifically detects new ncRNA families by combining RSP and covariance models, whereas LaRA focuses on the identification of local RNA alignments considering both sequence and secondary structure conservation.In short, thermodynamic-based methods work with single RNA sequences due to similar algorithms as RSP systems, while comparative sequence analysis methods require MSA to enhance the accuracy and performance of RSP or RIP.

Pairwise alignments
The conventional approach for comparative sequence analysis mainly focuses on RSP due to several challenges in detecting RIP.For instance, the limitation of prediction within in vitro settings, the prevalence of false-positive predictions due to the high magnitude of predicted RNA-RNA duplexes and potential interaction partners, and the impact of external factors (other interacting RNAs/small ligands/proteins in vivo).Comparative RIP identifies the role of an RNA regulator via direct base-pairing with its target RNA.
Two primary strategies for comparative RIP are shown in Figure 4B.Similar to comparative RSP, the first RIP method (individual RIP) predicts the interaction between two alignments rather than two distinct sequences.Hypothetically, strong sequence signals distinguish binding sites and interactions based on their conserved structural residues.It is commonly believed that homology can help deduce binding sites and interactions.Tools such as PETcofold [174] and RNAripalign [201] leverage this hypothesis.PETcofold is an extended version of PETfold capable of predicting conserved RRIs [174], whereas RNAripalign identifies RRIs based on sequence and structural conservation [201].
Richter and Backofen [202] proposed that interaction sites between RNAs may not always be strictly conserved, suggesting that conserved interactions can occur even without precise conserved interaction sites.However, their statements contradict most of the alignment-based hypotheses that assume strict conservation of interaction sites.Henceforth, a new method combining individual RIP tools without requiring a strict consensus is introduced.It generates more reliable results and uncovers conserved regulatory mechanisms across different systems.This second method outperforms individual RIP tools.RNAhybrid, published by Krüger and Rehmsmeier in 2006 [203], predicts homologous miRNAs on orthologous targets from various organisms.
However, duplex energies predicted by RNAhybrid must be transformed into P values, as the former is strongly inf luenced by the GC content and frequency of dinucleotides of the selected organisms.As duplex prediction relies on base-pair stacking, maintaining the dinucleotide frequency is crucial, and mononucleotide shuff ling would prevent the generation of random sequences that accurately represent the features of the nonrandom system.The joint P value is used to identify possible interactions between two RNA alignments [25].Similarly, CopraRNA uses Hartung's method to compute a joint P value for a cluster of homologous RNA sequences [204,205].
Table 8 provides a comprehensive summary of RIP tools focussing on pairwise alignment in comparative sequence analysis.

Pseudoknots: Loops and helical stems in RNA folding thermodynamics
RNAs contain an abundance of motifs, which are defined as discrete sequences or combinations of base juxtapositions.Structural motifs in RNA can form pseudoknots by base-pairing of single-stranded RNA regions in the hairpin loop with complementary nucleotides in the RNA chain [206].The H-type pseudoknot is the most basic example, with a hairpin loop interacting with complementary nucleotides outside the loop [207].Pseudoknots are critical components of RSP and RIP due to their involvement in translation readthrough mechanisms and are essential for identifying RNA complex functions [208].Hinh et al. [209] also discovered a novel role of the 'trans-pseudoknot' RRI in the functional dimerization of human telomerase.
Additionally, the relationship between pseudoknots, RNA folding stability and conformational changes suggests that the interplay between loops and helical stems is essential in calculating RNA stability and folding thermodynamics [210][211][212][213]. Evaluating folding thermodynamics involves applying energy parameters to calculate the conformation energy and chain entropy, but this process can be computationally demanding and is limited to specific subclasses of pseudoknots [214].
For instance, using the DPP algorithm, Rivas and Eddy [215] developed an RSP tool called PKNOTS to fold optimal pseudoknotted RNAs (ranging from 100 to 200 nt), marking the beginning of prediction attempts on the secondary structure of RNA pseudoknots.PKNOTS can handle the broadest class of structures but is limited to small molecules due to its long running time [216].Another DPP-based tool, HotKnots, offered faster prediction using a heuristic approach but could not guarantee the lowest free energy due to the vast conformational space and computational complexity.The search space is typically enormous, making an exhaustive search infeasible [216].In short, existing DDP algorithms for pseudoknot prediction are both unreliable and inefficient.
Comparative methods are more reliable in predicting pseudoknot structures, but they are often selected in an ad hoc manner for specific purposes and require expert intervention [217].The maximum weighted matching (MWM) algorithm can generate meaningful predictions, but it requires a large number of homologous sequences to detect strong covariance signals.However, the MWM algorithm is sensitive to noisy data such as misalignment, as it allows unrealistic interactions and may overlook the prevalence of helices as the most common structural elements in RNA structures [218,219].
On the other hand, the iterated loop matching (ILM) algorithm combines both thermodynamic and comparative approaches to predict the secondary structure of RNA pseudoknots efficiently and reliably, even when only a few sequences are available.The ILM algorithm prioritises the formation of stable helices over computing a theoretically optimal structure, which proves to be beneficial by significantly enhancing the overall prediction accuracy.This advantage is particularly significant in situations where the available data are insufficient for a method such as MWM to generate reliable predictions using unrestricted models [220,221].
Other examples of pseudoknot prediction tools are FlexStem and Kinefold.FlexStem constructed secondary RNA structures with pseudoknots by adding maximal stems based on the free energy model [222], whereas Kinefold used a long-term RNA folding simulation to predict pseudoknot structures with topological and geometrical constraints [223].
External pseudoknots or crossing interactions are formed when two interacting RNAs form pseudoknots.However, most of the thermodynamic-based tools disallowed the formation of pseudoknots and caused failure in predicting joint structures formed by nontrivial interactions between two RNAs.To address this problem, Eckart et al. developed NanoFolder, a program that predicts the base pairing of potential pseudoknots in RNA nanostructures.First, a simple energy model is used to calculate all possible helices, followed by a greedy algorithm to select the minimum free energy helices owing to their incorporation into the RNA complex [224].Compared to NanoFolder, VfoldCPX uses a similar approach but a more advanced selection algorithm [225].Meanwhile, IPknot could predict RNA secondary structures using a diverse set of pseudoknots from an individual sequence or MSA as an input [226].Although comparative sequence analysis can predict pseudoknots, its accuracy is still limited.In brief, most of the computational methods predict the structure and RRI of pseudoknots using a thermodynamic-based approach, as reported in Table 9.

CHALLENGES IN RNA STRUCTURE AND RNA-RNA INTERACTION PREDICTION
With the rapid growth of biological data and technologies, there has been a surge in research for predicting structural RNA and RRI using computational approaches.However, researchers often overlook that the outputs from these tools do not ref lect the actual RNA structure but rather assumption-based algorithms.In thermodynamic-based approaches, base pairs with higher free energies are occasionally ignored due to the lack of evidence in the literature.Representation of the 'prediction/theoretical' as the 'true/actual' RNA secondary structure or RRI results in the acceptance of an untested possibility without further investigation [82].Moreover, the kinetic RNA structures that form during folding may serve as a crucial indicator of RNA functions [227].For instance, riboswitches usually regulate metabolic functions via structural conformation instead of retaining a static native structure [228].In addition, noncanonical base pairs also play a crucial role in forming tertiary RNA structures, necessitating their inclusion in the prediction process.Nevertheless, predicting both canonical and noncanonical base pairs remains a challenge.Noncanonical interactions must still be optimised as they may contain additional chemical probing information that facilitates RNA structure modelling and comprehension of functional RNA modules.In addition, predictions of RNA tertiary structure are less accurate in loop regions, where noncanonical pairs are required to evaluate structural details [229,230].Comparative-based techniques are limited by the need for a more extensive set of homologous sequences.Due to the limited knowledge of known RNA families, obtaining homologous sequences for all RNAs is unfeasible, resulting in a preference for score-based RSP with a single RNA sequence as input.The 'predicted' outputs should not be regarded as a substitute for comprehensive experimental RSP and RIP determination, as these algorithm-based prediction tools operate under the assumption that the nucleotides are likely to engage in secondary structure elements with the maximum predicted number of Watson-Crick base-pairings [117,231,232].The automatic modelling methodology is another challenge in RSP and RIP tools.Due to limited experimental data, most currently available automated web servers only rely on RNA sequences as input with low accuracy.Therefore, integrating the experimental data into computational methods will be of assistance in enhancing the accuracy of RSP accuracy [79].
To improve the prediction accuracy of RIP and RSP tools, we concluded that five main challenges must be addressed as follows: (i) the limited number of examples with mapped interactions, (ii) limited focus on the kinetic RNA structures, (iii) the low specificity due to the restriction of single sequences, (iv) overreliance to 'predicted' output rather than experimental data and (v) the high cost for a search of complex types interactions provided a guaranteed maximum score is to be obtained.

ARTIFICIAL INTELLIGENCE: CURRENT TRENDS AND FUTURE DIRECTIONS
Artificial intelligence has emerged as a powerful approach to predicting RNA structure and function [233].In previous years, numerous prediction methods have been developed with the primary goal of identifying RNA structures that are likely to exhibit an MFE state, such as proteins [234].However, over the past two decades, machine learning (ML) has been proposed as an alternative methodology to enhance the accuracy and calculation speed of RIP and RSP tools [235].It was previously overlooked due to limited accuracy resulting from small training datasets and the constraints of simplistic ML models [236].Due to the recent surge in RNA sequence data and advancements in ML, particularly deep learning (DL), the latest ML-based approaches surpass existing traditional methods in both accuracy and applicability, providing an advantage in tackling complex questions in structural biology while dealing with large datasets.DL algorithms leverage reference structures to train scoring parameters for decomposed substructure analysis, making them a more efficient and scalable alternative to traditional experimental procedures [237].
RNA Interactome Scoper (RIscoper) is a ground-breaking AI tool based on natural language processing (NLP) that extracts RNA structure and interactions from published literature using an N-gram model [238].NLP automates tasks by extracting useful information from unstructured text and converting it into a structured format for computational analysis.NLP techniques have substantially improved in recent years, demonstrating their effectiveness across various domains.These include literaturebased discovery, aiding the analysis of high-throughput data such as gene expression and genome-wide association studies [239].ML-based approaches, on the other hand, can be categorised into two major groups, each aligned with a distinct phase in the RSP and RIP process: ML-based scoring schemes and ML-driven prediction processes.
Score-based methods are the most widely used traditional computational methods and have dominated the field of RIP and RSP.Scoring methods assume that RNA structures must satisfy specific score-based criteria, which can vary depending on the RNA folding mechanism, making secondary RSP an optimization problem.Dynamic programming (DP) algorithms are commonly employed to discover the optimal structure by dividing it into smaller components with individual scores and require a sophisticated scoring scheme with numerous parameters.However, DP algorithms are often deemed inefficient for large inputs, as their running time increases rapidly with the input size based on RNA sequence length and may overlook unique base pairs and weak interactions [233].Understanding the RNA folding mechanism through the score-based method is thus a formidable challenge, in contrast to data-driven ML methods that do not rely on such mechanisms.
In this review, we highlighted two categories of ML-based methods for RIP and RSP according to the subprocess, e.g.(i) score scheme based on ML (free energy parameter-refining approach, weighted approach, and probabilistic approach) and (ii) ML-driven prediction process (end-to-end approach and hybrid approach) (Table 10).All ML methods within these two categories trained their models through supervised learning, wherein model parameters were adjusted based on input-output pairs.RIP and RSP primarily employ features such as free energy parameters, RNA sequences, and sequence patterns as input, and the trained model outputs can be either classification labels or free energy values.The probabilistic approach based on ML is one of the earliest scoring schemes that used stochastic context-free grammars (SCFGs) to predict RNA structures and interactions.Datasets containing RNA sequences annotated with known secondary structures are used to estimate the probability parameters of the SCFG model [240].
Andronescu and colleagues introduced the constraint generation (CG) method, a pioneering computational approach for estimating RNA-free energy parameters.This approach was designed to train on large datasets containing structural and thermodynamic information efficiently.By incorporating ML techniques, CG can predict and design RNA secondary structures with high accuracy [241].Another notable tool, CONTRAfold, takes a different approach by using conditional log-linear models that generalise SCFGs through discriminative training and feature-rich scoring.This allows CONTRAfold to accurately predict RNA secondary structures based on probabilistic models [242].ContextFold employs feature-rich scoring models that are trained extensively on large datasets [243].This approach captures more complex relationships in the data, but there is a potential risk of overfitting, where the model becomes too specific to the training data and performs poorly on new, unseen data [244].
The ML-driven prediction process, on the other hand, adopts deep learning (DL) in predicting RNA structure [245].SPOT-RNA, for instance, focuses on leveraging deep neural network learning to predict all base pairs, regardless of their association with local or nonlocal interactions.This approach leverages the power of DL to capture intricate patterns and features within RNA sequences [246].To overcome limitations and enhance prediction accuracy, hybrid approaches have been introduced [233].One example is the combination of thermodynamic and ML-based strategies, where the model of CONTRAfold and MFE (concatenation-based method and complex joint category) is used to predict RNA interactions [163,242].This hybrid method leverages the strengths of both thermodynamic principles and ML techniques to improve the accuracy of RIP.Nucleic Acid Package 4.0 (NUPACK 4.0), a hybrid tool, integrates ML-based and concatenation-based MFE methods for analysing and designing interacting RNA strands across multiple species.It enables the examination of RNA sequences in complex and test tube ensembles containing an arbitrary number of interacting strand species [148,149].
For RSP, a method called DMfold has been proposed.DMfold combines deep learning and an improved base-pair maximization principle to predict RNA secondary structures with pseudoknots.By learning from similar RNA sequences instead of highly homogeneous sequences, DMfold reduces the requirement for auxiliary sequences and improves folding accuracy [247].Motif identifier for nucleic acids trajectory (MINT) is an automatic tool to analyse 3D structures of RNA molecules, their molecular dynamics trajectories and other conformation changes [248].On the other hand, CompaRNA utilizes a combination of 28 singlesequence methods and 13 comparative methods for continuous automated benchmarking [249,250].Although CompaRNA is primarily based on comparative sequence analysis rather than the ML method, it incorporates several ML-based tools, such as ContextFold and CONTRAfold, as part of its analysis pipeline [242,243].This demonstrates the synergy between comparative sequence analysis and machine learning, where ML algorithms complement evolutionary information and sequence conservation to improve predictions.
While ML techniques have significantly enhanced prediction methods in terms of accuracy, applicability, and processing speed, there remains a need for more sophisticated ML models to fully address the challenges of the RSP and RIP problems, particularly in predicting high-resolution structures [233].Nevertheless, given the rapid expansion of RNA sequence data, the availability of high-performance hardware and continuous advancements in machine learning methods, there is a potential for the future development of cutting-edge RSP and RIP tools that could surpass traditional approaches in terms of both execution speed and accuracy.

SELECTING THE BEST APPROACH: PRACTICAL RECOMMENDATIONS
Choosing the most suitable method for RIP or RSP depends on the specific research objectives.For instance, if the primary goal is on RIP and identifying binding sites, the IO method may be the preferred option since it excels at detecting interaction regions and base-pairing sites.However, IO methods are not designed to provide detailed structural information about the individual molecules involved [87,127,142,203].On the other hand, the concatenation-based method is selected for predicting the MFE structure of an entire RNA molecule, considering potential intramolecular interactions and structural elements.These methods offer a comprehensive perspective on the folding behaviour of RNA and have the capability to capture complex structures and interactions.However, they are frequently computationally demanding, particularly when applied to large RNA molecules [87,143].
Accessibility-based MFE algorithms, as employed in RNAup, IntaRNA, and RNAplex, have demonstrated superior performance in RSP and RIP when compared to the previous two types of tools [128,133,134].In an analysis of a bacterial dataset by Umu and Gardner in 2017 [86], these algorithms showed their ability to distinguish nearly half of the native interactions from the background noise.This accomplishment is facilitated by the integration of well-designed negative controls such as dinucleotide shuff ling, enabling the utilization of predicted MFE values and distinct scoring mechanisms to effectively discriminate native interactions from spurious ones [86,251].These accessibility algorithms are especially valuable for de novo predictions, particularly in scenarios where computational efficiency is essential, as is the case with IntaRNA and RNAplex, given that candidate target RNAs can be extensive, spanning thousands of nucleotides [128,134,135,205].RNAplex, in particular, excels at identifying correct interaction regions that might be embedded within larger RNA targets [128].In essence, accessibility-based MFE algorithms excel IO and concatenation-based tools due to their consideration of RNA sequence structural accessibility and evaluation of base-pairing potential, improving the capability to discern real interactions from nonspecific interactions.
In the context of selecting RSP and RIP tools based on comparative sequence analysis, Pfold and RNAalifold generally exhibit strong performance, especially for well-aligned short sequences [172,173].However, it is worth noting that RNAalifold outperforms in terms of speed and is better suited for well-aligned, longer RNA sequences [172].For datasets comprising short sequences (< 200 bases) with significant diversity, Dynalign is a suitable choice because it does not rely on sequence similarity, and its scoring function excludes sequence comparisons [183].In other scenarios, a combination of RNAalifold and/or Pfold can be employed to fold similar RNA sequences [172,173], while RNAforester and/or MARNA can be used to align these folded RNA molecules [252,253].Notably, most of the MSA algorithms do not favour transitions over transversions or employ ad hoc two-parameter methods to model these distinctions (e.g.ClustalW [170]).This can be relevant because structural RNA sequences often evolve rapidly through structure-neutral mutations, which tend to involve transitions rather than transversions [254,255].Therefore, multiple sequence algorithms that utilise more sophisticated yet accurate models of sequence evolution are likely to produce improved alignments for folding [164].
Table 11 offers a comprehensive overview of the advantages and limitations associated with MFE-based RSP and RIP tools.Additionally, Figure 5 presents a chronological depiction of the development timeline of RSP and RIP tools.Understanding this timeline is crucial for selecting the most appropriate tools based on research objectives and the evolution of available technologies.

CONCLUSION
In recent years, the intersection of structure-based RNA analysis and computational biology has garnered significant attention as researchers recognize the crucial role of RNA structures in RNA function.Despite the availability of large-scale RNA sequence data, the development of computational algorithms for RSP and RIP has faced challenges, including the complexity of RNA structures and limited training datasets.These challenges have been met with advancements in computational techniques, and the progress in RSP tools has provided a solid foundation for the development of RIP tools, enabling a deeper exploration of the intricate network of RRIs and their functional implications.This review aimed to provide a comprehensive overview of existing computational tools for both RSP and RIP, focusing on two main types of RRIs and the strategies employed to predict them.ML has also been integrated into RIP and RSP methodologies.However, it is important to note that ML-based methods cannot yet replace wet lab experiments and traditional computational approaches to obtain high-resolution RNA structures or accurate RIP.Nonetheless, the advent of deep learning technologies and high-performance hardware will foster a new generation of RIP and RSP tools with improved accuracy and running speed.

Key Points
• Bridging the Gap: This comprehensive review features the connections between RSP and RIP, underscores the importance of RNA homologues, delves into the intricacies of pseudoknots and dissects the thermodynamics of RNA folding.

Figure 1 .
Figure 1.Potential interactions in RNA molecules.(A) Possible base-pairing of nucleotides.(B) cis-only RRI (intramolecular base-pairing) within a single RNA molecule.(C) Trans RRI (intermolecular base-pairing) between two identical RNA molecules.(D) Situation in concatenation-based prediction tool, where both RRI types are involved.Inter-and intramolecular base pairs are indicated by vertical pipe symbols and arches, respectively (adapted from [25]).

Figure 2 .
Figure 2. Foundation of RNA-RNA interaction prediction tools.(A) Two core strategies, namely, deterministic dynamic programming algorithm and comparative sequence analysis.(B) Venn diagram portraying the relationships between these strategies and emphasizing the overlap, demonstrating their interconnectedness.
free energy change b) Free energy change of hybridised duplex between oligomer and target c) Melting temperature d) Free energy cost for opening base pairs in the region of complementarity to the target e) Free energy change of the self-structure of unimolecular oligo f) Free energy change of oligo-oligo dimer g) The number of suboptimal structures of the target used before and after the binding of oligomer h) Free energy difference between the 5 and 3 end of the antisense strand of siRNA, with windows RIP tool designed to quickly search possible hybridisation sites for a query RNA in large RNA databases as well as short interactions between two long RNAs At least 1 FASTA file containing target and query RNA sequences or 2 CLUSTAL files as input • Computation of optimal and suboptimal structure (one structure per line) • Conservation profile, consensus structure, and interactions with one, two and three types of base pairs • Types of RNAplex: a) RNAplex-aA (accessibility and MSA as input) b) RNAplex-cA (interaction-only and MSA as RNA) [140] • The first large-scale RIP tool using a seed-and-extend framework based on suffix arrays with a focus on perfect-complementary seed regions and extensions on both ends, applicable to all kinds of interaction predictions, and can be accessed via the conda package manager RNA sequences in FASTA format • Quick localization of potential near-complementary interactions between given query and target sequences • A modified Smith-Waterman-Gotoh algorithm based on di-nucleotides to approximate nearest-neighbour energy parameters • Discovery of RRIs on genome/transcriptome-wide scale • Parallel suffix array matching and seed extension • Prediction of siRNA off-targets, including: a) Putative siRNA-RNA interactions b) Intersection with transcriptomic data c) Partition function d) Accessibility of binding sites e) Evaluation of siRNA off-target predictions and potential measurements f) Relationship between inhibition efficiency and off-targeting potential of siRNAs g) Validation of off-targeting potential measures Multiple species T (continued)

Figure 4 .
Figure 4. Comparative RNA structure prediction (RSP) and RNA-RNA interaction prediction (RIP).(A) The three main approaches in comparative RSP:1.Align-then-fold approach; 2. Fold-then-align approach and 3. Sankoff-type approach (alignment and folding simultaneously); (B) The two main approaches in comparative RIP are (i) interaction between two alignments via an individual RIP tool and (ii) interactions obtained from the joint output of multiple individual RIP tools (adapted from[257]).

Figure 5 .
Figure 5. Timeline of RNA structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools.(A) Chronological overview of RSP and RIP tools, highlighting the different approaches via minimum free energy and comparative sequence analysis; (B) tools involving pseudoknots and artificial intelligence.

Table 1 :
Interaction-only RIP tools based on MFE

Table 2 :
Accessibility-based RIP tools based on MFE

Table 2 :
Continued • A program that calculates the thermodynamics of RRIs by assessing the probability of a potential unpaired binding site, combining it with interaction energy to obtain the total binding energy, making it ideal for in-depth RIP especially when the interaction partners are known or when a candidate set has already been obtained by faster, less accurate methods A Hitchhiker's guide | 11

Table 3 :
Concatenation-based RIP tools based on MFE algorithms

Table 4 :
Align-then-fold RSP tools based on comparative sequence analysis

Table 7 :
Fold-then-align RSP tools based on comparative sequence analysis

Table 8 :
RIP tools based on pairwise alignment in comparative sequence analysis

Table 9 :
RSP and RIP tools involving pseudoknots

Table 10 :
Artificial intelligence-based RIP and RSP tools • Informative Figures: Our review includes figures that elucidate RRI types, emphasise the two core strategies within RIP, simplify explanations of each strategy subtype, and present chronological timelines that trace the evolution of RSP and RIP tools.• Comprehensive Summary: A comprehensive summary of RSP and RIP tools, meticulously organised into detailed tables for each strategy type, is available.These tables encompass characteristics of the RSP and RIP tools, citations, concise definitions and functions, input and output specifications, applicable species, and status (active or inactive) for enhanced clarity.• Challenges and Future Directions: We highlight five primary challenges in RSP and RIP and elaborate on how the integration of artificial intelligence through machine learning and deep learning holds the potential to significantly enhance RSP and RIP.• Practical Recommendations: A dedicated section is included to offer valuable advice for the effective utilisation of RSP and RIP tools in various research applications.